In this analysis I will be using exponential smoothing (Holt and Holt-Winters) and ARIMA models to forecast coffee prices. I will be using data from the FRED Fed St.Louis database going back 30 years to 1990. Let’s take a look at the data:
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(TSstudio)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
coffee_df <- read.csv("/Users/lawrence/Google Drive/DS/Time Series/coffee_prices/PCOFFOTMUSDM.csv")
coffee_df$DATE<-as.yearmon(coffee_df$DATE)
coffee_xts <- xts(coffee_df$PCOFFOTMUSDM,order.by = coffee_df$DATE)
ts_info(coffee_xts)
## The coffee_xts series is a xts object with 1 variable and 380 observations
## Frequency: monthly
## Start time: Jan 1990
## End time: Aug 2021
ts_plot(coffee_xts,
title = "Coffee Prices in USD cents per pound by year",
Ytitle = "USD cents/pound",
Xtitle = "Year")
Looking at the graph, we notice a repeating sort of boom-bust behavior, with perhaps a slight upwards trend. Let’s take a look what a time-series decompositon can tell us:
decompose(as.ts(coffee_xts)) %>% plot()
The time-series reveals a not particularly strong looking trend pattern, with a peak around the late 90s-early 2000s, a trough in the first decade of the 2000s and then another peak around 2011.
Let’s start off by fitting a Holt exponential smoothing model. Unlike the AR, MA and ARIMA family of models, exponential smoothing does not rely on the assumption of a stationary process structure which likely is not met in this case. We can therefore start fitting the model without any differencing or log-transformations.
I will use the first 29 years as training data and use this to make predictions for the last 12 months in the series, which is the period from August 2020 to August 2021.
#Fitting a Holt model
train <- window(coffee_xts,start=min(index(coffee_xts)),end=index(coffee_xts)[368])
test <- window(coffee_xts,start=index(coffee_xts)[369],end=max(index(coffee_xts)))
fc_holt <- holt(train, h = 12, initial = "optimal")
fc_holt$model
## Holt's method
##
## Call:
## holt(y = train, h = 12, initial = "optimal")
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 1e-04
##
## Initial states:
## l = 90.9197
## b = 0.2342
##
## sigma: 11.1066
##
## AIC AICc BIC
## 3952.104 3952.270 3971.644
accuracy(fc_holt, test)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.02907554 11.04610 7.374341 -0.3261507 5.597682 0.2193069
## Test set 9.88953573 21.80892 15.125739 4.4747258 7.828736 0.4498272
## ACF1
## Training set 0.1464713
## Test set NA
test_forecast(coffee_xts, forecast.obj = fc_holt, test = test)
The model appears to be capturing the upwards tending trend relatively well. Let’s confirm that this is not just one-off luck by using an expanding window approach, similar to using cross-validation in other situations, check the model’s performance using more than just one test data set. I will be using the expanding window method of training and testing the data as we don’t have a huge amount of data points.
A caption
Credit to Uber for creating this great illustration of how the expanding window works, their blog post has been a great help: https://eng.uber.com/forecasting-introduction/ .
We will be starting with just the first 10 years of data (which seem to be a good representation of the overall data set, not just an upwards trend if we used only say the first five years). We will then use the rolling window approach to forecast the next 12 months.
start <- min(index(coffee_xts)) #First time-stamp
end <- index(coffee_xts)[116] #Time stamp after 10 years
# now use those parameters to make a vector of the time steps at which each
# window will end
steps <- seq(from = 117, to=length(coffee_xts), by = 12)
# using lapply, iterate the forecasting-and-scoring process over the
# windows that created
scores <- lapply(steps, function(x) {
train <- coffee_xts[1:(x - 1)]
test <- coffee_xts[x:(x+11)]
model <- holt(train, h = 12, initial = "optimal")
fcst <- forecast(model, h = 12)
accuracy(fcst, test)
})
rmse_holt <- mean(unlist(lapply(scores, '[[', "Test set","RMSE")))
rmse_holt
## [1] 23.5429
When using expanding window validation, the RMSE turns out to be slightly worse than before when just using one test set. This does not come too surprising as this is likely a more realistic estimate of the model’s performance under new data.
scores <- lapply(steps, function(x) {
train <- coffee_xts[1:(x - 1)]
test <- coffee_xts[x:(x+11)]
model <- holt(train, h = 12, initial = "optimal")
fcst <- forecast(model, h = 12)
test_forecast(coffee_xts[1:(x + 11)] , forecast.obj = fcst, test = test)
})
par(mfrow=c(4,6))
scores
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
##
## [[7]]
##
## [[8]]
##
## [[9]]
##
## [[10]]
##
## [[11]]
##
## [[12]]
##
## [[13]]
##
## [[14]]
##
## [[15]]
##
## [[16]]
##
## [[17]]
##
## [[18]]
##
## [[19]]
##
## [[20]]
##
## [[21]]
##
## [[22]]
Let’s try fitting a more complex Holt-Winters model instead:
#Fitting a Holt-Winters model
train <- window(coffee_xts,start=min(index(coffee_xts)),end=index(coffee_xts)[368])
test <- window(coffee_xts,start=index(coffee_xts)[369],end=max(index(coffee_xts)))
coffee_hw <- HoltWinters(train)
coffee_hw
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = train)
##
## Smoothing parameters:
## alpha: 0.9673651
## beta : 0
## gamma: 1
##
## Coefficients:
## [,1]
## a 165.4996670
## b -0.4025304
## s1 1.8625886
## s2 1.1920594
## s3 -1.0629192
## s4 -2.4406946
## s5 -1.3640668
## s6 0.9354580
## s7 2.7668083
## s8 3.8466633
## s9 2.9001371
## s10 -0.4763221
## s11 -1.5721914
## s12 0.7429645
coffee_fc <- forecast(coffee_hw, h = 12)
accuracy(coffee_fc, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.6309571 11.54192 7.731166 0.2658334 5.927505 0.2299186
## Test set 14.1516951 25.74196 17.571196 6.7867824 9.003306 0.5225531
## ACF1
## Training set 0.1496691
## Test set NA
test_forecast(actual = coffee_xts,
forecast.obj = coffee_fc,
test = test)
The simpler Holt model actually appears to be capturing the data better than the more complex Holt Winters model. While the RMSE for the Holt model lies at 21.8 for the test set, for the Holt Winters model it is at 25.8.
Let’s see if we can optimise the model parameters to make the model fit better:
shallow_grid <- ts_grid(as.ts(train),
model = "HoltWinters",
periods = 6,
window_space = 6,
window_test = 12,
hyper_params = list(alpha = seq(0,1,0.1),
beta = seq(0,1,0.1),
gamma = seq(0,1,0.1)),
parallel = TRUE,
n.cores = 8)
## Warning in ts_grid(as.ts(train), model = "HoltWinters", periods = 6,
## window_space = 6, : The value of the 'alpha' parameter cannot be equal to 0
## replacing 0 with 1e-5
## Warning in ts_grid(as.ts(train), model = "HoltWinters", periods = 6,
## window_space = 6, : The value of the 'beta' parameter cannot be equal to 0
## replacing 0 with 1e-5
## Warning in ts_grid(as.ts(train), model = "HoltWinters", periods = 6,
## window_space = 6, : The value of the 'gamma' parameter cannot be equal to 0
## replacing 0 with 1e-5
## Warning: Strategy 'multiprocess' is deprecated in future (>= 1.20.0). Instead,
## explicitly specify either 'multisession' or 'multicore'. In the current R
## session, 'multiprocess' equals 'multisession'.
## Warning in supportsMulticoreAndRStudio(...): [ONE-TIME WARNING] Forked
## processing ('multicore') is not supported when running R from RStudio
## because it is considered unstable. For more details, how to control forked
## processing or not, and how to silence this warning in future R sessions, see ?
## parallelly::supportsMulticore
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
shallow_grid$grid_df[1:10,]
## alpha beta gamma 1 2 3 4 5 6
## 1 1e-05 3e-01 0.2 3.259949 2.957065 4.319846 4.914546 7.227507 12.683663
## 2 1e-05 4e-01 0.2 3.064788 3.056210 4.587374 5.127167 7.180967 12.384058
## 3 1e-05 5e-01 0.2 2.902599 3.185806 4.852596 5.337895 7.134910 12.087406
## 4 1e-05 2e-01 0.2 3.484681 2.918825 4.118109 4.734639 7.274533 12.986245
## 5 1e-05 6e-01 0.2 2.741683 3.325441 5.126647 5.546744 7.089332 11.793680
## 6 1e-05 1e-01 0.2 3.711201 2.909436 3.973622 4.582827 7.322051 13.291831
## 7 1e-05 7e-01 0.2 2.624255 3.514323 5.448750 5.753729 7.044391 11.502857
## 8 5e-01 3e-01 0.1 3.107215 2.112685 4.584414 11.859944 9.258243 5.000949
## 9 1e-05 1e-05 0.2 3.960820 2.921206 3.827892 4.455120 7.395545 13.600415
## 10 1e-05 8e-01 0.2 2.522319 3.701614 5.780206 5.970728 7.064689 11.279777
## mean
## 1 5.893762
## 2 5.900094
## 3 5.916869
## 4 5.919505
## 5 5.937255
## 6 5.965161
## 7 5.981384
## 8 5.987242
## 9 6.026833
## 10 6.053222
plot_grid(shallow_grid)
Let’s refine the search space a little given this output:
deep_grid <- ts_grid(as.ts(train),
model = "HoltWinters",
periods = 6,
window_space = 6,
window_test = 12,
hyper_params = list(alpha = seq(0,0.5,0.01),
beta = seq(0,0.8,0.01),
gamma = seq(0.1,0.2,0.01)),
parallel = TRUE,
n.cores = 8)
## Warning in ts_grid(as.ts(train), model = "HoltWinters", periods = 6,
## window_space = 6, : The value of the 'alpha' parameter cannot be equal to 0
## replacing 0 with 1e-5
## Warning in ts_grid(as.ts(train), model = "HoltWinters", periods = 6,
## window_space = 6, : The value of the 'beta' parameter cannot be equal to 0
## replacing 0 with 1e-5
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
plot_grid(deep_grid)
Let’s see how our model perform with the newly tuned parameters:
coffee_hw_grid <- HoltWinters(train,
alpha = deep_grid$alpha,
beta = deep_grid$beta,
gamma = deep_grid$gamma)
fc_hw_grid <- forecast(coffee_hw_grid, h = 12)
accuracy(fc_hw_grid, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.03505119 15.39042 10.14099 0.1661806 7.542112 0.3015849
## Test set 10.38485723 19.44346 12.83573 4.9636213 6.535817 0.3817242
## ACF1
## Training set 0.598813
## Test set NA
test_forecast(actual = coffee_xts,
forecast.obj = fc_hw_grid,
test = test)
Having optimised the model’s parameters brings down the test RMSE to 19.44 - a real improvement! This can be seen visually in the above graph as well, the model appears to be doing a much better job at predicting the future price movement of the coffee price.
Let’s see how this model performs on with our expanding window cross-validation. A potential issue to keep in mind here is that the model has seen all the data up until August 2020 already in the model tuning process, therefore there might be an upward bias when using an expanding window. Let’s see how it performs anyway:
start <- min(index(coffee_xts)) #First time-stamp
end <- index(coffee_xts)[116] #Time stamp after 10 years
# now use those parameters to make a vector of the time steps at which each
# window will end
steps <- seq(from = 117, to=length(coffee_xts), by = 12)
# using lapply, iterate the forecasting-and-scoring process over the
# windows that created
scores <- lapply(steps, function(x) {
train <- coffee_xts[1:(x - 1)]
test <- coffee_xts[x:(x+11)]
model <- HoltWinters(train,
alpha = deep_grid$alpha,
beta = deep_grid$beta,
gamma = deep_grid$gamma)
fcst <- forecast(model, h = 12)
accuracy(fcst, test)
})
rmse_hw_tuned <- mean(unlist(lapply(scores, '[[', "Test set","RMSE")))
rmse_hw_tuned
## [1] 27.52651
We get a RMSE of 27.53, quite a bit higher than using just the test period of Sep 2020 - Aug 2021 as before. Let’s see how the standard, untuned Holt-Winters model performs using the expanding window approach:
scores <- lapply(steps, function(x) {
train <- coffee_xts[1:(x - 1)]
test <- coffee_xts[x:(x+11)]
model <- HoltWinters(train)
fcst <- forecast(model, h = 12)
accuracy(fcst, test)
})
## Warning in HoltWinters(train): optimization difficulties: ERROR:
## ABNORMAL_TERMINATION_IN_LNSRCH
rmse_hw <- mean(unlist(lapply(scores, '[[', "Test set","RMSE")))
rmse_hw
## [1] 24.55856
Using the unoptimised, out-of-the-box HoltWinters model instead significantly improves RMSE to 24.56. A classic case of overfitting the model to the training data it seems. SUrprisingly, the simpler Holt model performed even better, with a RMSE of 23.54. Sometimes simpler really is better.
Let’s see how an ARIMA model compares to this. I will start off looking at the ACF (Auto-correlation) and PACF(Partial Auto Correlation) graphs:
train <- window(coffee_xts,start=min(index(coffee_xts)),end=index(coffee_xts)[368])
test <- window(coffee_xts,start=index(coffee_xts)[369],end=max(index(coffee_xts)))
#Fitting an ARIMA model
par(mfrow=c(1,2))
acf(train, lag.max = 60)
pacf(train, lag.max = 60)
We can see from the ACF plot that the correlation of the series with its lags is slowly decaying over time in a linear manner. Removing both the series trend and correlation between the series and its lags can be done by differencing the series.
The ACF and PACF plots of the first difference of the series indicate that an AR(1) process could be appropriate to use on the differenced series since the ACF is tailing off and the PACF cuts on the first lag. We will do another grid-search to determine the optimal parameters for the ARIMA model:
p <- q <- P <- Q <- 0:2
arima_grid <- expand.grid(p,q,P,Q)
names(arima_grid) <- c("p", "q", "P", "Q")
arima_grid$d <- 1
arima_grid$D <- 1
arima_grid$k <- rowSums(arima_grid)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:xts':
##
## first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
arima_grid <- arima_grid %>% filter(k <= 7)
arima_search <- lapply(1:nrow(arima_grid), function(i){
md <- NULL
md <- arima(train, order = c(arima_grid$p[i], 1, arima_grid$q[i]),
seasonal = list(order = c(arima_grid$P[i], 1, arima_grid$Q[i])))
results <- data.frame(p = arima_grid$p[i], d = 1, q = arima_grid$q[i],
P = arima_grid$P[i], D = 1, Q = arima_grid$Q[i],
AIC = md$aic)
}) %>% bind_rows() %>% arrange(AIC)
## Warning in arima(train, order = c(arima_grid$p[i], 1, arima_grid$q[i]), :
## possible convergence problem: optim gave code = 1
## Warning in arima(train, order = c(arima_grid$p[i], 1, arima_grid$q[i]), :
## possible convergence problem: optim gave code = 1
head(arima_search)
## p d q P D Q AIC
## 1 0 1 2 0 1 1 2748.403
## 2 2 1 0 0 1 1 2748.800
## 3 2 1 2 0 1 1 2749.656
## 4 0 1 2 0 1 2 2749.970
## 5 0 1 2 1 1 1 2749.975
## 6 1 1 2 0 1 1 2750.186
Looks like the first few models are quite similar when it comes to their AIC scores. Let’s see how they perform under a rolling-window cross validation:
start <- min(index(coffee_xts)) #First time-stamp
end <- index(coffee_xts)[116] #Time stamp after 10 years
# now use those parameters to make a vector of the time steps at which each
# window will end
steps <- seq(from = 117, to=length(coffee_xts), by = 12)
# using lapply, iterate the forecasting-and-scoring process over the
# windows that created
scores <- lapply(steps, function(x) {
train <- coffee_xts[1:(x - 1)]
test <- coffee_xts[x:(x+11)]
model <- arima(train, order = c(arima_search[1,1],arima_search[1,2],arima_search[1,3]),
seasonal = c(arima_search[1,4],arima_search[1,5],arima_search[1,6]))
fcst <- forecast(model, h = 12)
accuracy(fcst, test)
})
rmse_mod1 <- mean(unlist(lapply(scores, '[[', "Test set","RMSE")))
rmse_mod1
## [1] 24.18696
scores <- lapply(steps, function(x) {
train <- coffee_xts[1:(x - 1)]
test <- coffee_xts[x:(x+11)]
model <- arima(train, order = c(arima_search[2,1],arima_search[2,2],arima_search[2,3]),
seasonal = c(arima_search[2,4],arima_search[2,5],arima_search[2,6]))
fcst <- forecast(model, h = 12)
accuracy(fcst, test)
})
rmse_mod2 <- mean(unlist(lapply(scores, '[[', "Test set","RMSE")))
rmse_mod2
## [1] 24.09081
scores <- lapply(steps, function(x) {
train <- coffee_xts[1:(x - 1)]
test <- coffee_xts[x:(x+11)]
model <- arima(train, order = c(arima_search[3,1],arima_search[3,2],arima_search[3,3]),
seasonal = c(arima_search[3,4],arima_search[3,5],arima_search[3,6]))
fcst <- forecast(model, h = 12)
accuracy(fcst, test)
})
rmse_mod3 <- mean(unlist(lapply(scores, '[[', "Test set","RMSE")))
rmse_mod3
## [1] 24.56284
Looks like AIC was right and the three top models really are not much different to each other. Model 2 performs best using cross-validation and I therefore choose this one to move forward with:
train <- window(coffee_xts,start=min(index(coffee_xts)),end=index(coffee_xts)[368])
test <- window(coffee_xts,start=index(coffee_xts)[369],end=max(index(coffee_xts)))
coffee_best_mod <- arima(train, order = c(arima_search[2,1],arima_search[2,2],arima_search[2,3]),
seasonal = c(arima_search[2,4],arima_search[2,5],arima_search[2,6]))
coffee_best_mod
##
## Call:
## arima(x = train, order = c(arima_search[2, 1], arima_search[2, 2], arima_search[2,
## 3]), seasonal = c(arima_search[2, 4], arima_search[2, 5], arima_search[2,
## 6]))
##
## Coefficients:
## ar1 ar2 sma1
## 0.1369 0.1191 -1.0000
## s.e. 0.0527 0.0527 0.0461
##
## sigma^2 estimated as 117.6: log likelihood = -1370.4, aic = 2748.8
coffee_test_fc <- forecast(coffee_best_mod, h = 12)
accuracy(coffee_test_fc, test)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.06745962 10.64917 7.200495 -0.1458572 5.496339 0.2141369
## Test set 4.90500109 19.36887 14.387215 1.6960009 7.606218 0.4278641
## ACF1
## Training set 0.004953077
## Test set NA
Again, the RMSE using just the test set of 19.36 is significantly lower than the one using expanding window cross validation (24.36). There likely again is some overfitting/ luck going on.
test_forecast(coffee_xts,
forecast.obj = coffee_test_fc,
test = test)
checkresiduals(coffee_best_mod)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,0)(0,1,1)[12]
## Q* = 21.017, df = 21, p-value = 0.4579
##
## Model df: 3. Total lags used: 24
plot_forecast(coffee_test_fc)
Let’s see how an automatically tuned ARIMA model using auto arima compares:
coffee_auto_mod <- auto.arima(train)
coffee_auto_mod
## Series: train
## ARIMA(2,1,0)
##
## Coefficients:
## ar1 ar2
## 0.1337 0.1146
## s.e. 0.0519 0.0519
##
## sigma^2 estimated as 118.1: log likelihood=-1395.3
## AIC=2796.59 AICc=2796.66 BIC=2808.31
coffee_auto_fc <- forecast(coffee_auto_mod, h = 12)
accuracy(coffee_auto_fc, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1951992 10.82143 7.196412 -0.01366291 5.475269 0.2140155
## Test set 6.2972685 20.86656 14.818239 2.40140398 7.800935 0.4406824
## ACF1
## Training set 0.004172294
## Test set NA
And again using sliding window cross validation:
start <- min(index(coffee_xts)) #First time-stamp
end <- index(coffee_xts)[116] #Time stamp after 10 years
# now use those parameters to make a vector of the time steps at which each
# window will end
steps <- seq(from = 117, to=length(coffee_xts), by = 12)
# using lapply, iterate the forecasting-and-scoring process over the
# windows that created
scores <- lapply(steps, function(x) {
train <- coffee_xts[1:(x - 1)]
test <- coffee_xts[x:(x+11)]
model <- auto.arima(train)
fcst <- forecast(model, h = 12)
accuracy(fcst, test)
})
rmse_auto <- mean(unlist(lapply(scores, '[[', "Test set","RMSE")))
rmse_auto
## [1] 24.40014
Conclusion:
rmse<-c(rmse_holt,rmse_hw,rmse_hw_tuned,rmse_mod1,rmse_mod2,rmse_mod3,rmse_auto)
labels<-c("Holt","Holt-Winters","Holt-Winters Tuned","Arima Mod 1","Arima Mod 2","Arima Mod 3","Arima Auto")
df <- data.frame(rmse,labels)
df <- df[order(rmse),]
df
## rmse labels
## 1 23.54290 Holt
## 5 24.09081 Arima Mod 2
## 4 24.18696 Arima Mod 1
## 7 24.40014 Arima Auto
## 2 24.55856 Holt-Winters
## 6 24.56284 Arima Mod 3
## 3 27.52651 Holt-Winters Tuned
Having testing a variety of different models, it appears that the simpler Holt model does the best job at predicting future data, measured by RSME. Sometimes the rule of Occam’s rasor, simpler is better, really is true.